Goto

Collaborating Authors

 image model


Learning to See by Looking at Noise

Neural Information Processing Systems

Current vision systems are trained on huge datasets, and these datasets come with costs: curation is expensive, they inherit human biases, and there are concerns over privacy and usage rights. To counter these costs, interest has surged in learning from cheaper data sources, such as unlabeled images. In this paper, we go a step further and ask if we can do away with real image datasets entirely, by learning from procedural noise processes. We investigate a suite of image generation models that produce images from simple random processes. These are then used as training data for a visual representation learner with a contrastive loss. In particular, we study statistical image models, randomly initialized deep generative models, and procedural graphics models. Our findings show that it is important for the noise to capture certain structural properties of real data but that good performance can be achieved even with processes that are far from realistic. We also find that diversity is a key property for learning good representations.



P2P: TuningPre-trainedImageModelsforPoint CloudAnalysiswithPoint-to-PixelPrompting SupplementalMaterial

Neural Information Processing Systems

From the classification ablation results, summation is better than max-pooling and mean-pooling. After migrating them to point cloud analysis with Point-to-Pixel Prompting, we report the number of trainable parameters (Tr. Methodmax mean sum Accuracy 92.2 92.3 92.7 (b)SegmentationAblations.


P2P: TuningPre-trainedImageModelsforPoint CloudAnalysiswithPoint-to-PixelPrompting

Neural Information Processing Systems

However,itisnon-trivialtopromote such a pretraining-tuning paradigm to the 3D vision, given the limited training data that are relatively inconvenient to collect.


ProceduralImageProgramsforRepresentation Learning

Neural Information Processing Systems

Existing work focuses on ahandful ofcurated generativeprocesses which require expert knowledge to design, making it hard to scale up. To overcome this, we propose training with alarge dataset of twenty-one thousand programs, each one generating adiverse setofsynthetic images.



Hands On With Google's Nano Banana Pro Image Generator

WIRED

Google's latest AI image model is vastly better than the previous release at generating text in images. You can expect companies to go buck wild with this update. Nano Banana Pro generated this image, assembling a crowd of standalone characters into one scene. Corporate AI slop feels inescapable in 2025. From website banner ads to outdoor billboards, images generated by businesses using AI tools surround me.



Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video

arXiv.org Artificial Intelligence

Typically, children start to learn their first words between 6 and 9 months, linking spoken utterances to their visual referents. Without prior knowledge, a word encountered for the first time can be interpreted in countless ways; it might refer to any of the objects in the environment, their components, or attributes. Using longitudinal, egocentric, and ecologically valid data from the experience of one child, in this work, we propose a self-supervised and biologically plausible strategy to learn strong visual representations. Our masked autoencoder-based visual backbone incorporates knowledge about the blind spot in human eyes to define a novel masking strategy. This mask and reconstruct approach attempts to mimic the way the human brain fills the gaps in the eyes' field of view. This represents a significant shift from standard random masking strategies, which are difficult to justify from a biological perspective. The pre-trained encoder is utilized in a contrastive learning-based video-text model capable of acquiring word-referent mappings. Extensive evaluation suggests that the proposed biologically plausible masking strategy is at least as effective as random masking for learning word-referent mappings from cross-situational and temporally extended episodes.


Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models

arXiv.org Artificial Intelligence

Social media has exacerbated the promotion of Western beauty norms, leading to negative self-image, particularly in women and girls, and causing harm such as body dysmorphia. Increasingly content on the internet has been artificially generated, leading to concerns that these norms are being exaggerated. The aim of this work is to study how generative AI models may encode 'beauty' and erase 'ugliness', and discuss the implications of this for society. To investigate these aims, we create two image generation pipelines: a text-to-image model and a text-to-language model-to image model. We develop a structured beauty taxonomy which we use to prompt three language models (LMs) and two text-to-image models to cumulatively generate 5984 images using our two pipelines. We then recruit women and non-binary social media users to evaluate 1200 of the images through a Likert-scale within-subjects study. Participants show high agreement in their ratings. Our results show that 86.5% of generated images depicted people with lighter skin tones, 22% contained explicit content despite Safe for Work (SFW) training, and 74% were rated as being in a younger age demographic. In particular, the images of non-binary individuals were rated as both younger and more hypersexualised, indicating troubling intersectional effects. Notably, prompts encoded with 'negative' or 'ugly' beauty traits (such as "a wide nose") consistently produced higher Not SFW (NSFW) ratings regardless of gender. This work sheds light on the pervasive demographic biases related to beauty standards present in generative AI models -- biases that are actively perpetuated by model developers, such as via negative prompting. We conclude by discussing the implications of this on society, which include pollution of the data streams and active erasure of features that do not fall inside the stereotype of what is considered beautiful by developers.